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Abstract. An fc-noncrossing RNA structure can be identified with an fc-noncrossing diagram 
over [n], which in turn corresponds to a vacillating tableaux having at most (fc — 1) rows. In 
this paper we derive the limit distribution of irreducible substructures via studying their corre- 
sponding vacillating tableaux. Our main result proves, that the limit distribution of the numbers 
of irreducible substructures in fe-noncrossing, cr-canonical RNA structures is determined by the 
density function of a r(— Inrjj, 2)-distribution for some < 1. 



1. Introduction and background 



In this paper we analyze the number of irreducible substructures of fc-noncrossing, cr-canonical 
RNA structures. We prove that the numbers of irreducible substructures of fc-noncrossing. cr- 
canonical RNA structures are, in the limit of long sequence length, given via the density function 
of a r(— luTfe, 2)-distribution. 

An RNA structure is the helical configuration of its primary sequence, i.e. the sequence of nu- 
cleotides A, G, U and C, together with Watson-Crick (A-U, G-C) and (U-G) base pairs. As 
RNA structure is oftentimes tantamount to its function, it is of key importance. The concept of 
irreducibility in RNA structures is of central importance since the computation of the minimum 
free energy (mfe) configuration of a given RNA molecule is determined by its largest, irreducible 
substructure. 
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Three decades ago, Waterman [THl HH UHl [HI HZ] pioneered the combinatorics of RNA secondary 
structures, an RNA structure class exhibiting only noncrossing bonds. Secondary structures can 
readily be identified with Motzkin-paths satisfying some minimum height and plateau-length, see 
Figure [T] The latter restrictions arise from biophysical constraints due to mfc and the limited 




Figure 1 . The phenylalanine tRNA secondary structure, as generated by the computer 
folding algorithm cross [l^, represented as planar graph, diagram and Motzkin-path. 
The structure has arc-length > 8 and stack-length > 3 and uniquely corresponds to a 
Motzkin-path with minimum height 3 and minimum plateau-length 7. 

flexibility of chemical bonds. It is clear from the particular bijcction, that irreducible substructures 
in RNA secondary structures are closely related to the number of nontrivial returns, i.e. the number 
of non-cndpoints, for which the Motzkin-path meets the x'-axis. 
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For Dyck-paths this question has been studied by Shapiro [S], who showed that the expected 
number of nontrivial returns of Dyck-paths of length 2n equals • Subsequently, Shapiro and 
Cameron [1] derived expectation and variance of the number of nontrivial returns for generalized 
Dyck-paths from (0, 0) to {{t + l)n, 0) 

(1.1) m]-'-^ and v[^^],2M;^-l)(ft+l)" + l). 

The bijection between Dyck-path of length 2n and the unique triangulation of the {n + 2)-gon, 
due to Stanley [22j . implies a combinatorial proof for E[^i]. An alternative approach is to employ 
the Riordan matrix |20| . an infinite, lower triangular matrix L — iln,k)n,k>a = (ffi/)- where 
aiz) = E«>o3"^"' /(^) = E„>o/"^" ■^ith /o = 0,/i 7^ 0, such that E«>fc '".fc^" = 9{z)f^{z). 
Clearly, 

C(z) - V C7„z" = ^'"^y where C„ = 

^-^ 2z n + 1 

is the generating function of Dyck-paths and let Cn,i denote the number of Dyck-paths of length 
2n with j nontrivial returns. We consider the Riordan matrix L = {Cn,j)n,j>o = {zC{z), zC{z)) 
and extract the coefficients from its generating function (zC(z))^"'"^ by Lagrange inversion. 
Setting f{z) = zG{f{z)) with J{z) = C{z) - 1 and G{z) = (1 + zf, we obtain 

where X]j>o Cn.j = C'n- From this we immediately compute IE[Ci] ~ Sj>ii ' %^ and V[^i] = 
Sj>i ' %^ ~ (Sj>i i ' %^) ' fro™ which the expression of eq. for f = 1 follows. 

In Section [3] we consider the bivariate generating function directly, which relates to the Riordan 
matrix in case of generalized Dyck-path as follows 

^ ^' l-wzC(z) 

n>0 j>0 j>0 ^ ' 

Our main idea is to derive the bivariate generating function from the Riordan matrix employing 
irreducible paths and to establish via singularity analysis a discrete limit law. This is done, 
however, for the far more general class of C-tableaux introduced in Section [21 in Theorem [7] we 
show that the limit distribution of nontrivial returns for these vacillating tableaux is given in terms 
of the density function of a r(A, r)-distribution, which is, already for Motzkin-paths, a new result. 
For restricted Motzkin-paths satisfying specific height and plateau-lengths, the Riordan matrix 
Ansatz does not work "directly" , since the inductive decomposition of restricted Motzkin-paths 
is incompatible. Instead we introduce the notion of irreducible paths and express the Riordan 
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matrix in terms of the latter, see Lemma [5J This Ansatz allows us to compute the generating 
function of irreducible paths via setting one indeterminate of the bivariate generating function to 
one. The framework developed in Section [3] and Section 01 in fact works as long as the generating 
function of the particular path-class has a singular expansion and is explicitly known. We have, 
for instance, for nontrivial returns of Motzkin-paths with height > 3 and plateau length > 3: 
lim„^oc E[?M] ~ 0.8625 and lim„^oo V[r/„] w 1.2343. 

Indeed, RNA structures are far more complex than secondary structures: they exhibit additional, 
cross-serial nucleotide interactions [TH]. These interactions were observed in natural RNA struc- 
tures, as well as via comparative sequence analysis [28]. They are called pseudoknots, see Figure[2l 
and widely occur in functional RNA, like for instance, eP RNA [l^ as well as ribosomal RNA [T4] . 
RNA pseudoknots are conserved also in the catalytic core of group I introns. In plant viral RNAs 
pseudoknots mimic tRNA structure and in vitro RNA evolution }23j experiments have produced 
families of RNA structures with pscudoknot motifs, when binding HIV-1 reverse transcriptase. 




60 




Figure 2. The hepatitis delta virus (HDV)-pseudoknot structure and its diagram rep- 
resentation. Top: the structure as folded by cross [12] for k — 3 and minimum stack size 
3 and the corresponding diagram representation (bottom). 



Combinatorially, cross serial interactions arc tantamount to crossing bonds. To this end, RNA 
pseudoknot structures have been modeled via fc-noncrossing diagrams [5], i.e. labeled graphs over 
the vertex set [n] = {1, . . . , n} with degree < 1. Diagrams are represented by drawing their vertices 
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1, . . . , 71 in a horizontal line and its arcs (i, j), where i < j, in the upper half plane. Here the degree 
of i refers to the number of non-horizontal arcs incident to i, i.e. the backbone of the primary 
sequence is not accounted for. The vertices and arcs correspond to nucleotides and Watson-Crick 
(A-U, G-C) and (U-G) base pairs, respectively, see Figure [2] Diagrams are characterized via 




1 23456789 10 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 



Figure 3. fc-noncrossing diagrams; we display a 4-noncrossing, arc-length A > 4 and 
CT > 1 diagram (top), where the edge set {(1, 7), (3, 9), (5, 10)} is a 3-crossing, the arc 
(2, 6) has length 4 and (5, 10) has stack-length 1. Below, we display a 3-noncrossing, A > 4 
and a > 2 (lower) diagram, where (2, 6) has arc-length 4 and the stack ((2, 6), (1, 7)) has 
stack-length 2. 

their maximum number of mutually crossing arcs, fc — 1, their minimum arc-length. A, and their 
minimum stack-length, a. A fc-crossing is a set of k distinct arcs (ii, ji), (i2, ^2)7 • ■ • (ik^jk) with 
the property ii < ^2 < • • • < *fc < Ji < < • • • < ifc- A diagram without any fc-crossings is called 
a fc-noncrossing diagram. The length of an arc is j — i and a stack of length cr is a sequence 
of "parallel" arcs of the form 

((i, j), (z + 1, J - 1), . . . , (z + (a - 1), J - (a - 1))). 

A subdiagram of a fc-noncrossing diagram is a subgraph over a subset M C [n] of consecutive 
vertices that starts with an origin and ends with a terminus of some arc. Let (ii, . . . ,im) be a 
sequence of isolated points, and (ji, J2) be an arc. We call (ii, . . . , z,„) interior if and only if there 
exists some arc (ji, ^2) such that ji < ii < i,„ < 72 holds and exterior, otherwise. Any exterior 
sequence of consecutive, isolated vertices is called a gap. A diagram or subdiagram is called 
irreducible, if it cannot be decomposed into a sequence of gaps and subdiagrams, see Figure HI 
Accordingly, any fc-noncrossing diagram can be uniquely decomposed into an alternating sequence 
of gaps and irreducible subdiagrams. In fact irreducibility is quite common for natural RNA 
pseudoknot structures, see Figure [3 
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Subdiagram 

1 2 3 4 5 6 7 8 9 10 11 12 

Gap O O O O 

12 3 4 



Irreducible subdiagram 

1 2 3 4 5 6 7 8 9 10 11 12 

Figure 4. Subdiagrams, gaps and irreducibility: a diagram (top), decomposed into the 
subdiagram over (1, 6), the gap (7, 8) and the subdiagram over (9, 12). A gap (middle) 
and an irreducible diagram over (1, 12). 



220 224 227 233 236 240 




200 

Figure 5. mRNA-Eca: the irreducible pseudoknot structure of the regulatory region 
of the a ribosomal protein operon. 



We call a fc-noncrossing, cr-canonical diagram with arc- length > 4 and stack- length > a, a fc- 
noncrossing, (T-canonical RNA structure, see Figure [3l We accordingly adopt the notions of gap, 
substructure and irreducibility for RNA structures. 

Our main result is Theorem [51 which proves that the numbers of irreducible substructures are 
in the limit of long sequence length given via the density function of a r(— Inrj;, 2)-distribution. 
Furthermore, we show that the probability generating function of the limit distribution is given 
by q{u) = "j^^^J^ja , where is expressed in terms of the generating function of fc-noncrossing, 
cr-canonical RNA structures [16] and its dominant singularity ak- In Figure [H] we compare our 
analytic results with mfe secondary and 3-noncrossing structures generated by computer folding 
algorithms [211 [12], respectively. The data indicate that already for n ~ 75, the limit distribution 
of Theorem [6l provides for both structure classes a good fit. 
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Figure 6. For n = 75 the Ihs displays the distribution of irreducible substructures 
obtained by folding 10* random sequences into their RNA secondary structures [24] 
(red), and the scaled density function of a r(— ln(0.2241), 2)-distribution (blue) sampled 
at the positive integers. The rhs shows this distribution obtained by folding 9 x 10^ 
random sequences into 3-noncrossing, 3-canonical structures |12) (red) and the scaled 
density function of a r(— ln(0.0167), 2)-distribution (blue) derived from Theorem [S] 



The paper is organized as follows: in Section [2] we recall some basic combinatorial background. 
Of particular importance here is the bijection between fc-noncrossing diagrams and vacillating 
tableaux of Theorem [T] with at most (fc — 1) rows [1]. In Section [31 we present all key ideas and 
derive the limit distribution of *-tableaux. In Section[3]we study the limit distribution of nontrivial 
returns using the framework developed in Section [3] 



A Ferrers diagram (shape) is a collection of squares arranged in left-justified rows with weakly 
decreasing number of boxes in each row. A standard Young tableau (SYT) is a filling of the 
squares by numbers which is strictly decreasing in each row and in each column. We refer to 
standard Young tableaux as Young tableaux, see Figure [71 A vacillating tableau V^'"' of shape A 
and length 2n is a sequence of Ferrers diagrams (A°, A"'^, . . . , A^") of shapes such that (i) A" = 
and A^" = A, and (ii) (A^*~^,A^*) is derived from A^*~^, for 1 < i < n, by one of the following 
operations. (0,0): do nothing twice; (—0,0): first remove a square then do nothing; (0,+n): 
first do nothing then adding a square; (±0, zbD): add/remove a square at the odd and even steps, 
respectively. We denote the set of vacillating tableaux by V|". The RSK-algorithm is a process of 
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Ferrers digram 



Young tableau 
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Figure 7. Ferrers diagram and Young tableau. 

(0,+n) (+□,+□) (-□,0) (-□,+□) {-n,0) (-n,0) 

□ g Qn gQn|~jnn00 



Figure 8. A vacillating tableau of shape and length 12. 



row-inserting elements into a Young tableau. Suppose we want to insert q into a standard Young 
tableau of shape A. Let Xij denote the element in the i-th row and j-th column of the Young 
tableau. Let j be the largest integer such that < q- (If Ai,i > q, then j = 1.) If Xij does 

not exist, then simply add q at the end of the first row. Otherwise, if Aij exists, then replace Ai.j 
by q. Next insert Xij into the second row following the above procedure and continue until an 
element is inserted at the end of a row. As a result, we obtain a new standard Young tableau with 
q included. For instance, inserting the sequence 5, 2, 4, 1, 6, 3, starting with an empty shape yields 
the standard Young tableaux displayed in Figured 
The RSK-insertion algorithm has an inverse [J] , see Lemma [T] below, which will be of central 

■ ^ 2 '' ^ ^ 

1 I 3| 6| 

5 

Figure 9. RSK-insertion of the elements 5,2,4,1,6,3. The insertion of the above 
sequence successively constructs a standard Young tableau. 

importance for constructing a vacillating tableaux from a tangled diagram. 

Lemma 1. Suppose we are given two shapes A* C A*~^, which differ by exactly one square. Let 
Ti^i and Ti he SYT of shape A*~^ and A*, respectively. Given A* and Ti^i, then there exists a 
unique j contained in Ti^i and a unique tableau Ti such that T^-i is obtained from Ti by inserting 
j via the HSK- algorithm. 
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In addition, Lemma [T] explicitly constructs this unique j such that Ti_i is obtained from Ti by 
inserting j via the RSK-algorithm, see Figure [TOl 



ffl ffl 
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Figure 10. How Lemma [1] works: Given the Young tableau, Ti^i and the shape 
Xi, we show how to find the unique j (note here we have j = 1) such that Ti^i is 
obtained from Ti by inserting 1 via the RSK-algorithm. 



2.1. Prom diagrams to vacillating tableaux and back. RNA tertiary interactions, in partic- 
ular the interactions between helical and non-helical regions give rise to consider tangled diagrams 
The key feature of tangled diagrams (tangles) is to allow for two interactions: one being 
Watson-Crick or G-U and the other being a hydrogen bond for each nucleotide. A tangled dia- 
gram, Gn, over [n] is obtained by drawing its arcs in the upper halfplane having vertices of degree 
at most two and a specific notion of crossings and nestings [4]. The inflation, of a tangle is a 




123456 123456789 10 11 12 



Figure 11. Tangled diagrams: the first tangled diagram represents the key bonds of 
the hammerhead ribosome and the second tangle represents key bonds of the catalytic 
core region of the Group I self-splicing intron [2]. 

diagram, obtained by "splitting" each vertex of degree two, j, into two vertices j and j' having 
degree one, see Figure [T^l Accordingly, a tangled diagram with £ vertices of degree two is expanded 
into a diagram over n + £ vertices. Obviously, the inflation has its unique inverse, obtained by 
simply identifying the vertices By construction, the inflation preserves the maximal number 
of mutually crossing and nesting arcs [3]. Given a fc-noncrossing tangle, we can construct a vacil- 
lating tableaux, using the following algorithm: starting from right to left, we take three types of 
actions: we either RSK-insert. extract (via Lemma [1]) or do nothing, depending on whether we 
are given an terminus, origin or isolated point of the inflated tangle, see Figure fT3l In fact, the 
above algorithm has a unique inverse: from a vacillating tableaux, we can derive a unique tangle, 
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1 2 3 4 5 6 



1 2 2' 3 4 4' 5 6 



1-1 



Figure 12. The inflation of tlie first tangled diagram in Figur JTTI 
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Figure 13. From tangled diagrams to vacillating tableaux via the inflation: for the 
first tangled diagram in Figure [12] we present its inflation and its unique vacillating 
tableaux. 



see Figure [M] For +□ steps one simply inserts into the tableaux, does nothing for steps and 
RSK-extracts (Lemma [1]) for — □ steps. As result (see Figure [T3l and Figure fT4|) we derive the 
following theorem [4]. 

Theorem 1. There exists a bijection between k-noncrossing tangled diagrams and vacillating 
tableaux of type V^" having shapes A' with less than k rows. 



Theorem [T] implies bijections between various subclasses of vacillating tableaux and subclasses of 
tangles. Most notably the bijection [3] between fc-noncrossing diagrams and vacillating tableaux 
(of empty shape) such that (i) A° = and A^" = 0, and (ii) (A^'~^,A^*) is derived from A^'~^, 
for 1 < j < ri, by one of the following operations. (0,0): do nothing twice; (—0,0): first 
remove a square then do nothing; (0, +□): first do nothing then adding a square. We refer to the 
latter as f-tableaux. Obviously, the latter arc completely determined by the sequence of shapes 

(A2,A4,...,A2»-2). 



2.2. fc-noncrossing RNA structures. The combinatorics of fc-noncrossing RNA pscudoknot 
structures has been derived in [8l[9]. The set (number) of fc-noncrossing, cr-canonical RNA struc- 
tures is denoted by T^^ain) (T^. „{n)) and let fk{n,i) denote the number of fc'-noncrossing diagrams 
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(0,+O) (+□,+□) (-0,0) (-□,+□) (-□,0) (-□,0) 
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[T| [T 
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2' 


2 





Edge set 
Vertex set 
1 2,2' 



T| |T| [2I |T] [F] 

2I m 



2' 1 2 4' 

(2', 3) (1,4) (2,5) (4', 6) 



4,4' 



1 2 2' 3 4 4' 5 6 

Figure 14. From vacillating tableaux to tangled diagrams: For +□ steps one simply 
inserts into the tableaux, does nothing for and RSK-extracts (Lemma [1} for — □. The 
blue numbers 2', 1,2, 4' are obtained by RSK-extraction, corresponding to the "— □" 
steps for i = 3, 4, 5, 6. 



with arbitrary arc-length and £ isolated vertices over [n]. It follows from Theorcm[Tl that the num- 
ber of fc-noncrossing matchings on [2n] equals the number of walks from (fc — 1, fc — 2, • • • ,1) to 
itself that stay inside the Weyl Chamber xi > X2 > ■ ■ ■ > Xk-i > with steps ie^, 1 < i < fc — 1. 
The latter is given by Grabiner et al. [7j. It is exactly the situation = A = (fc — 1, fc — 2, • • • , 1) 
of equation (38) in [7]- As shown in detail in [8], Lemma 2 



(2.1) 
(2.2) 



dct[I,.,{2x)-h+,{2x)]\^-l, 



^i^A.(n,^) = e-det[/,_,(2a;)-/.+,(2a 



n>0 Ke=o 



lfe-l 



where Ir{2x) = J2j>o jf{r+jy. denotes the hyperbolic Bessel function of the first kind of order r. 
In particular for fc = 2 and fc = 3 we have the formulas 



(2.3) 



/2(n,£) = „ C(„_f)/2 and h{n,l) = 



-+2 



Cn- 



12 EMMA Y. JIN AND CHRISTIAN M. REIDYS * 



In view of fk{n,£) ~ {^)fk{n — i,0) everything can be reduced to matchings, where we have the 
foUowing situation: there exists an asymptotic approximation of the determinant of hyperbohc 
Bessel function for general order k due to [13j and employing the subtraction of singularities- 
principle |17j one can prove [13j 

(2.4) VfceN; /fc(2n,0) - Cfcn-«'=-i)'+('=-i)/2) (2(fc_ i))2"^ where Cfc > 0. 

Let Ffc(z) = J2n>o fki'^n^j O)^^" denote the generating function of fc-noncrossing matchings. Setting 

woix) ~ 5 ^ and 1)0(2:) = 1 - x + iuq(x)x'^ + wo(x)x^ + wo(x)x'^ 

1 — .T^ + X'^'^ 

we can now state the following result [TB] . 

Theorem 2. Let k,a e N, where k > 2,a > 3, let x be an indeterminate and pk = 2(k-i) 
the dominant, positive real singularity of Fk{z). Then Tj.^(a;), the generating function of k- 
noncrossing, cr-canonical structures, is given by 



vo{x) y vo{x) J 

Furthermore, 

(2.6) Tfe,^(n)~c,n-(^-i)'-('=-i)/2 I'-^'J , for fc = 2, 3, 4, . . . , 9, 



holds, where 7^, ^ is the minimal positive real solution of the equation ^^"(^^'''^ = Pk = 2{k-i) ' 



Via Theorem [T] each fc-noncrossing, cr-canonical structure corresponds to a unique f -tableau. We 
refer to the set of these tableaux as C-tableaux. 



2.3. Singularity analysis. In view of Theorem[2]it is of interest to deduce relations between the 
coefficients from the equality of generating functions. The class of theorems that deal with this 
deduction are called transfer-theorems [6j. We use the notation 

(2.7) {.f{z) = O {g{z)) as z p) <;=^ [f{z)/g[z) is bounded as z ^ p) 

and if we write f{z) ~ 0{g{z)) it is implicitly assumed that z tends to a (unique) singularity, 
[z"] f{z) denotes the coefficient of z" in the power series expansion of /(z) around 0. 
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Theorem 3. [S] Let f[z),g{z) be D -finite Junctions with unique dominant singularity p and sup- 
pose f(z) = 0{g{z)) for z ^ p. Then we have 

(2.8) [zlf{z)^K (^l-0(^i^^ [z-]g{z), 

where K is some constant. 



Theorem [3] and eq. (|2.4p imply 
(2.9) F,(z) = 




^)(fe-l)^ + (fc-l)/2-l 1^(1 _ fo^. ^ odd^ ^ ^ 

^)(fc-i)^+(fc-i)/2-i) for fc even, z pk, 



in accordance with basic structure theorems for singular expansions of £)- finite functions [6] . Fur- 
thermore, Theorem [HI eq. (|2.4p and the so called subcritical case of singularity analysis [6], VI. 9., 
p. 411, imply the following result tailored for our functional equations [lOj . Let pk denote the 
dominant positive real singularity of rfe(z). 

Theorem 4. Suppose daiz) is algebraic over K{z), analytic for \z\ < 5 and satisfies '!?cr(0) = 0. 
Suppose further ^k.a is the real unique solution with minimal modulus < S of the two equations 
i9ct(z) = Pk and iS^iz) = -pk- Then 

(2.10) n FkiMz)) - Ck ^-((^-D^+C^-i)/^) (^^-1)" . 



The below continuity theorem of discrete limit laws will be used in the proofs of Theorem [6] and 
Theorem [T] It ensures that under certain conditions the point-wise convergence of probability 
generating functions implicates the convergence of its coefficients. 

Theorem 5. Let u be an indeterminate and n be a set contained in the unit disc, having at least 
one accumulation point in the interior of the disc. Assume Pn{u) = 'Ylik>oPr>.,kU^ o,fid q{u) = 
Sfc>o qku'' such that lim„_>oo Pn{u) = q{u) for each u G holds. Then we have for any finite k, 

(2.11) lim pn,k = Qk and lim V'pnj = Y^^r 

j<k j<k 
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3. Irreducible substructures 



In the following we shall identify a ^-tableaux with the subsequence of even-indexed shapes, i.e. the 
sequence (A^, . . . , A^"^^). Subsequences of two or more consecutive 0-shapes result from the 
elementary move (0, 0). For instance, consider the ^-tableaux 



AO A^ A^ A« 

^(0,0] □(0,+n} 



(0,0) 



A« A^o Ai2 

(-□,0)Q(-D,0) ^ 



The above tableaux splits at A'^ 
AO 



A2 A2 
(0,0) ^^"^ (0,+n) I — |(0,+n) 



into two £-subtableaux, i.e. 
A4 A6 A8 



(0,0). 



Aio Ai2 

□(-□,0) ^ 



We call a sequence of consecutive 0-shapes of length (r+l), (0, . . . , 0) a gap of length r. Theorem[T] 
implies that these 0-gaps correspond uniquely to the gaps of diagrams, introduced in Section O 
A *-tableaux is a ^-tableaux, with the property A* 7^ for 2 < i < 2n — 2. It is evident that a 
^-tableaux corresponds via the bijcction of Thcorcm[T]to an irreducible /c-noncrossing, cr-canonical 
RNA structure. For instance, 



1 

A6 



3 4 

A* 



5 

Ai" 




\12 



9 10 11 

A" A16 



\1» 



v20 



v22 



Obviously, any ^-tableaux can be uniquely decomposed into a sequences of gaps and ^-tableaux. 
For instance. 



AO A2 A4 A6 

— - — — I 



AS Aio A12 Ai4 Ai*^ Ai8 A^o 

- I I — ' -O — ^0 — ^0 — - — ^0 
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splits into the gap (0, 2), the ^-tableaux over (2, 14) and the gap (14, 20). Let S^^^^ denote the number 
of ^-tableaux of length 2n with less than k rows, containing exactly j ^-tableaux. Furthermore, 
let 

(3.1) U,(^,u) = ^^<5i^]K^z", 

n>0 j>0 

and sit'' = X]j>o '^ij- T^k{z) ~ Tk^aiz) = J2n>o ^'n^'z^^ and denote the generating function 

of *-tableaux by Rfc(z). 

Lemma 2. The bivariate generating function of the number of ^-tableaux of length 2n with less 
than k rows, which contain exactly i ^-tableaux, is given by 



Ufe(z,M) = 



1 

l-z 



Proof. Since each ^-tableau can be uniquely decomposed into a sequence of gaps and ^-tableaux 
we obtain for fixed j 

(3.2) E-^"^^^" = (l^)'^'- 

As a result the bivariate generating function of (5„j is given by 



(3.3) u,(z, «) = E E = E ^^^i^y (t^ 



. 1 



z ) 1 — z — uRkiz) 



3>On>3 j>Q 

Setting M = 1 we derive 

(3.4) Tfe(z) = Ufe(z,l) = 1— ^ 

1 - 2 - tlk{z) 

which allows us to express the generating function of *-tableaux via Tfc(z) 

(3.5) R.(z) = l-.-^. 
Consequently, Ufc(z,u) is given by 

1 -J- 

(3.6) Ufe(z, u) = -— = 

and the lemma follows. □ 
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Setting g{z) = and h{z) = 1 — -jjzr^yf-ijj^ , Lemma [2] implies 

(3.7) Ufc(z,u) = g{z)-- ^——=g{z)-g{uh{z)). 

1 — unyz) 

Let be a r.v. such that P(^i'^'' = i) = and let pp and pw denote the radius of convergence 
of the power series p{z) and 'w{z), respectively. We denote Tw = 1™^^^- w{z) and call a function 
F{z, u) = piu ■ w{z)) subcritical if and only if < pp. 

Theorem 6. Let au he the real positive dominant singularity ofTkiz) and Tfe = 1 — ^i^aklTkiak) ' 
Then the r.v. satisfies the discrete limit law 

(3.8) lim P(ei''^ ^i) = qt where q, = ~ ^^'^ irl 

n — ^oo Ti^ 

That is, ^r/""^ is determined by the density function of a r(— Inr^, 2) -distribution. Furthermore, the 
probability generating function of the limit distribution q{u) ~ X]ri>i 9*"' satisfies q{u) = ^il^lu)^ ■ 



Proof. Since g{z) = and h{z) = 1 — (^i_2yri,(z) have non negative coefficients and h{0) = 0, the 
composition g{h{z)) is well defined as formal power series. According to eq. (|3.7p we may express 
\Jk{z,u) as Ufe(z,u) = g{z)g{uh{z)). For z = offe we have rfc = 1 - (i_a^,)Tfc(afc) < ^ = Pa^ i-*^- "^^ 
are given the subcritical case. 

Claim 1. h{z) has a singular expansion at its dominant singularity z = ak and there exists some 
constant Ck > such that 




Since Fk{z) is Z?-finitc, the composition Fk{'&{z)) where i?(z) = voi^] ^ ^^'-^ '^('^) ~ ^' ^~ 
finite [Hj. As a result, Tfc(z) is, being a product of the two D-finite functions ^^"^^^ and F^.(t?(z)), 
_D-finite. We view of ^^j^ is the composition of the outer function H(z) = and inner function 
Tk{z) — 1, where Tfe(O) — 1 = 0. We conclude from this, that h{z) = 1 — is D-finite. 

h(z) is analytic at z = and its I?-finitcness guarantees that h{z) has an analytic continuation in 
some simply connected Aq,^ -domain containing zero [21| . Consequently, the singular expansion of 
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h(z) at z ~ ak does exist and 



h{z) = Tfc + h'{ak){z - ak) + ^ 1"^'^ (z - + 



(Z - Ofe) + 



2! 



{z-akf + 



We next observe that Theorem |31 the singular expansion of Ffc(z) at pk and Theorem 3] imply 

- ^)(fc-i)'+(fc-i)/2-i ln(l - ^)) for fc odd, z - 

for fc even, z 



(3.10) Tfc(z) 



Suppose first that fc = 1 mod 2 and set /i = (fc — 1)^ + -^-^^ — 1. For z — > a^, eq. (|3.10p guarantees 



(3.11) 



Tfc(z) = £(afc) 1 



In 1 



z 



r{ak), 



where i{ak) < 0. Since i{ak) < 0, Tfe(z) is a power series with positive coefficients and in view of 
A* ^ 5j for any fc > 2, Tfc(afe) < oo. Accordingly, we obtain for z ^ ak 



(3.12) Ti{z) 



(3.13) n:(z) 



eiak) 1 



/i(Ai- 1) 



^(afc) 1 



In {^1 



Z 

ak 



£{ak) 



ak 



z 

ak 



M-i 



In I 1 

ak 



Eq. ([3T21) and eq. ([3^3]) imply 

h'{ak){z - ak) 
h"{ak), 



M^(«fc) 



z 

ak 



In 1 



ak 



{2^l-l)e{ak) 



1 



z 
ak 



M-2 



-(z - ak) 



h"'{ak) 
3! 



(2-afc)3 = 



Pl{h - l)£{ak) 
2T2(afe) 

Ai(/i- l)(Ai-2)^(afe) 



ak 



1-- In 1-- (1 + 0(1)) 



"A: 



We proceed by computing 
n(afe) 



/i(z) 



Tfc 



Tfc 



(z - afc) + 



3!n(afe) 

Tliak) (n(afe))^ 
T3(afe) 



1__ In l-_ (1 + 0(1)) 



2T^K) 

m(a^ - 1) m(m - 1)(m - 2) 
2 3! 



(z-afc)2 + 



z 



In 1 



z 

ak 



(l + o(l)) 



Tfc - Cfc 1 In 1 (1 + o(l)), where Cfe > 



The case fc = mod 2 is proved analogously and Claim 1 follows. Ufe(z, 1) is as the product of 
g{z) and g{h{z)) where h[Q) ~ 0, D-finite and has a singular expansion at z = a^. Without loss 
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of generality, we may restrict ourselves in the following to the case k — 1 mod 2 and proceed by 
computing 

Ufc(z,l) = g{ak)giTk) + {g ■ gih))'{ak){z - ak) + ^'^^^^ ^'^^\ z ~ atf + ■ ■ ■ 

= g{ak)g{Tk)-Ckg{ak)g'{Tk){l-—\ In f 1 - — Vl + o(l)). 

V akj \ OLk) 

Therefore we derive 

(3.14) [2"]Ufe(z, 1) = -Ckg{oik)g\rk)al''n-^-\\ + o(l)). 

For any fixed u € (0, 1) the singular expansion of \Jkiz, u) at z — Uk is given by 

Vk{z,u) = g{ak)g{uTk) + {g ■ g{uh)y {ak){z - ak) + (d ■ 9iuh))"{ak) _ ^^^2 ^ , , , 
= 9{ak)g{uTk) - Ckug{ak)g'{uTk) ( 1 - — ) In ( 1 - — ) (1 + o(l)) 

and we consequently obtain, setting = 1 — (^i^a^yi^iak) 

.QitrN V [^;"]Ufc(z,M) ug'{uTk) u(l-rfc)2 

„^oo [z»JUfc(2, 1) g'(Tk) (l-TkUy 

In view of eq. (|3.15p and [u'](7(u) = ''^"^''^ = gr^, Theorem [5] implies the discrete limit law 

(3.16) lim P(^i'=) =i)= lim ^ = g» where = ^—^irl 

n — 'oo n — >oo ' Tk 

Since the density function of a r(A, r)-distribution is given by 

(3.17) hA^) = \ ^ ' ^ 

10 a; < 0, 

where A > and r > 0, we obtain, setting r = 2 and A = — Inrj; > 

limP(e^=^) = ^^^{^■rl) = ^^^-^{lnrkf^■rl 

n^oo Tk Tk (luTfe)^ 

i^-Tkr 1 , 

Tfc (hlTfe)^ 

and the proof of the theorem is complete. □ 
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4. The limit distribution of nontrivial returns 

(k) 

Let Pn denote the number of ^-tableaux of length 2?!, which are m correspondence to fc-noncrossmg, 
(T-canonical RNA structures. Let pII"] denote the number of (^-tableaux of length 2?!, having exactly 
i 0-shapes contained in the sequence (A^, . . . , A^"). Let Wfc(z, u) denote the bivariate generating 
function of P^^j. Then P^^^^j = [z"u^]'Wk{z,u) and W/j(z,m) = J2j>oJ2n>j Pn,jz"-u^. Furthermore 
we set pit'' = [z"]Wfe(z,l). 

Lemma 3. The bivariate generating function of the number of (^-tableaux of length 2n, with less 
than k rows, containing exactly i 0-shapes, is given by 

(4.1) Wk{z,u) = ^ 

Proof. Suppose the ^-tableaux (A^,...,A^") contains exactly i 0-shapes. These 0-shapes split 
(A^, . . . , A^") uniquely into exactly i £-subtableaux, each of which either being a gap of length 2 
or an irreducible ^-tableaux. We conclude from this, that for fixed j 

(4.2) = {z + Ilk{z)y 

holds. Therefore the bivariate generating function Wk{z,u) satisfies 

Wfc(z,j/)=^^/3„,,z'V = ^(z + Rfc(z))^\iJ" 

j>On>j j>0 

1 

1 -u(2; + Rfc(z)) 
1 

where the last equality follows from eq. (|3.5p . proving the lemma. □ 
We set g{z) = ^i^) = 1 ^ Tk{z) ^^'^ '^"^^ denote the random variable having probability 

(k) /S'^'' 

distribution P{rin = i) = -j^. In case of Wfe(z,u) = g{uh{z)) we have pg = 1 while < 1, 
i.e. we are given the subcritical case. In our next theorem, we prove that the limit distribution of 
is determined by the density function of a r(A, r)-distribution. 
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Theorem 7. Let denote the real, positive, dominant singularity ofTk{z) and let Tk = l~xr(al) ' 
Then the r.v. r]"n^ satisfies the discrete limit law 

(4.3) lim V{ri^^^ ^ i) ^ qi, where qt = ~ ^^^-^ ir^ 

That is, vi^f' is determined by the density function of a T{— In Tk, 2) -distribution and the limit 
distribution has the probability generating function q{u) ~ X]ri>i 9*""* ~ . 



Proof. Since g{z) ~ and h{z) = 1 — x^tj^ have non negative coefficients and h{Q) = 0, the 
composition g{h{z)) is again a power series. Wfe(z, u) = g{uh{z)) has a singularity at z ~ oik and 
Tfe = 1 — rYklak) ^ ~ Pg^ whence we are given the subcritical case. Furthermore we observe, that 
regardless of the singularity arising from Tk{z) = 0, the dominant singularity of h{z) = 1 — 
equals the dominant singularity of Tfe(z), i.e., z = ak- 



Claim 1. h{z) has a singular expansion at z = and there exists some constant Ck > such that 

lu M - 



(4.4) 



Hz) = 



f Tk 


- Ck 




ak J 




-Ck 




oik ) 



(1 + o(l)) for fc = 1 mod 2 
for /c = mod 2 



for z ^ ak and = (fc - 1)^ + - 1. 

The proof of Claim 1 is analogous to that of Theorem [HI Again, we restrict ourselves to the case 
fc- = 1 mod 2. Wfc(z, 1) = g{h{z)) is D-finite and its Taylor expansion of at z = ak is given by 



Wfe(z,l) 



g{Tk) + {gh)'{ak){z - ak) + - a,)^ + , 



9{rk) + 



g'{Tk)l{ak) 



gijk) - Ckg'{Tk) 1 



2! 

- 1) 
2 

In (^1 - 



In 1-— (1 + 0(1)) 
ak J \ ak, 

(l + o(l)). 



Therefore we arrive at 

(4.5) [z^]Wk{z, 1) = c,g'(rfc)a-"n-^-i(l + o(l)). 

For any fixed u £ (0, 1) the singular expansion of W/j(z, u) ~ g[uh{z)) at z = aj, is given by 
Wfc(z, u) = g{uTk) + (5(?^/i))'(«fe)(z - afe) + M!^^^}ZM(^ _ <:,,)2 + . . . 



2! 



: g(urfc) - Ckug'iuTk) [ 1 



In 1 



z 



+ (1 + 0(1)) 
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from which we conclude 

(4.6) lim [^"]W..(.,.)^ V(.r,)^.(l-r,)^ ^^^^^ ^ ^ 



n-oo [z"]W,(z,l) .g'(rfc) (1-Tfeu)2 Tfe(afc)' 



In view of — ^''^ irl = qi, Theorem [5] implies the discrete limit law 



(4.7) lim P(77i^') = i) = Hm = 



In view of cq. (j3.17p . setting /' 2 and X = — lurk > 0, we analogously obtain 

limP(,W=.) = ii^(^.r^) = ^i^^^(l^r,)2^.r^ 

and Theorem [7] is proved. □ 
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